This report details the exploration of modeltime
The process of producing forecasts for time series data in general can be broken down into a few steps.
tidyquant or quantmod. No pre-processing is needed as the data will already be in a cleaned format.timetk package.forecast package, which has now been deprecated and replaced by the fable package. Machine learning models can deployed using the tidymodels framework with its’ machine learning focused packages and toolkit.modeltimetidyverse collection of packages, particularly modeltime lets user taps into the machine learning ecosystem of tidymodels, particularly the parsnip models.tidyverselubridatetimetkmodeltimetidymodelspackages <- c('tidyverse', 'lubridate', 'timetk', 'modeltime', 'tidymodels', 'tidyquant', 'glmnet', 'randomForest', 'earth')
for (p in packages){
if (!require(p,character.only = T)){
install.packages(p)
}
library(p,character.only = T)
}
The objective is to make the application stock agnostic, meaning we can use the application with any stock and not just any specific ticker. For the purpose of this report, we will be using Apple’s stock (AAPL)
stock <- tq_get("AAPL", get = "stock.prices", from = " 2020-01-01")
We’re only interested in the closing price from the start of this year, 2021
Data preprocessing, including doing differencing on the data to remove the trends and the time series data
stock_tbl <- stock %>%
select(date, close) %>%
filter(date >= "2021/01/01") %>%
set_names(c("date", "value")) %>%
mutate(value = diff_vec(value)) %>%
mutate(value = replace_na(value, 0))
## diff_vec(): Initial values: 129.410004
stock_tbl
## # A tibble: 67 x 2
## date value
## <date> <dbl>
## 1 2021-01-04 0
## 2 2021-01-05 1.60
## 3 2021-01-06 -4.41
## 4 2021-01-07 4.32
## 5 2021-01-08 1.13
## 6 2021-01-11 -3.07
## 7 2021-01-12 -0.180
## 8 2021-01-13 2.09
## 9 2021-01-14 -1.98
## 10 2021-01-15 -1.77
## # ... with 57 more rows
Plotting the stock’s historical data
stock_tbl %>%
plot_time_series(date, value, .interactive = TRUE)
Split the data, using the data from the last 3 weeks as validation data
splits <- stock_tbl %>%
time_series_split(assess = "3 weeks", cumulative = TRUE)
## Using date_var: date
splits %>%
tk_time_series_cv_plan() %>%
plot_time_series_cv_plan(date, value, .interactive = TRUE)
Since the date column is meaningless to the machine learning models, some features engineering are required to convert the data format. THis can be done using the step_timeseries_signature from modeltime, which returns a recipe object. With it, we can perform additional recipe functions such as Removing unused columns and creating dummy variables for categorical features
Parsnip models need to convert Date column to ID
recipe_spec <- recipe(value ~ date, training(splits)) %>%
step_timeseries_signature(date) %>%
step_rm(contains("am.pm"), contains("hour"), contains("minute"),
contains("second"), contains("xts"), contains("half"),
contains(".iso")) %>%
#step_mutate(value = replace_na(value, 0)) %>%
step_normalize(date_index.num) %>%
#step_fourier(date, period = 365, K = 5) %>%
step_dummy(all_nominal())
recipe_spec_parsnip <- recipe_spec %>%
update_role(date, new_role = "ID")
bake(recipe_spec %>% prep(), new_data = NULL)
## # A tibble: 53 x 34
## date value date_index.num date_year date_quarter date_month date_day
## <date> <dbl> <dbl> <int> <int> <int> <int>
## 1 2021-01-04 0 -1.65 2021 1 1 4
## 2 2021-01-05 1.60 -1.61 2021 1 1 5
## 3 2021-01-06 -4.41 -1.57 2021 1 1 6
## 4 2021-01-07 4.32 -1.52 2021 1 1 7
## 5 2021-01-08 1.13 -1.48 2021 1 1 8
## 6 2021-01-11 -3.07 -1.34 2021 1 1 11
## 7 2021-01-12 -0.180 -1.30 2021 1 1 12
## 8 2021-01-13 2.09 -1.26 2021 1 1 13
## 9 2021-01-14 -1.98 -1.21 2021 1 1 14
## 10 2021-01-15 -1.77 -1.17 2021 1 1 15
## # ... with 43 more rows, and 27 more variables: date_wday <int>,
## # date_mday <int>, date_qday <int>, date_yday <int>, date_mweek <int>,
## # date_week <int>, date_week2 <int>, date_week3 <int>, date_week4 <int>,
## # date_mday7 <int>, date_month.lbl_01 <dbl>, date_month.lbl_02 <dbl>,
## # date_month.lbl_03 <dbl>, date_month.lbl_04 <dbl>, date_month.lbl_05 <dbl>,
## # date_month.lbl_06 <dbl>, date_month.lbl_07 <dbl>, date_month.lbl_08 <dbl>,
## # date_month.lbl_09 <dbl>, date_month.lbl_10 <dbl>, date_month.lbl_11 <dbl>,
## # date_wday.lbl_1 <dbl>, date_wday.lbl_2 <dbl>, date_wday.lbl_3 <dbl>,
## # date_wday.lbl_4 <dbl>, date_wday.lbl_5 <dbl>, date_wday.lbl_6 <dbl>
Creating and fit an ARIMA model using modeltime
model_fit_arima <- arima_reg() %>%
set_engine("auto_arima") %>%
fit(value ~ date, training(splits))
model_fit_arima
## parsnip model object
##
## Fit time: 271ms
## Series: outcome
## ARIMA(0,0,0)(1,0,0)[5] with zero mean
##
## Coefficients:
## sar1
## -0.3853
## s.e. 0.1307
##
## sigma^2 estimated as 6.399: log likelihood=-124.29
## AIC=252.58 AICc=252.82 BIC=256.52
Similarly, we use modeltime to create and train a Prophet model
workflow_fit_prophet <- workflow() %>%
add_model(
prophet_reg() %>% set_engine("prophet")
) %>%
add_recipe(recipe_spec) %>%
fit(training(splits))
## Disabling yearly seasonality. Run prophet with yearly.seasonality=TRUE to override this.
## Disabling daily seasonality. Run prophet with daily.seasonality=TRUE to override this.
workflow_fit_prophet
## == Workflow [trained] ==========================================================
## Preprocessor: Recipe
## Model: prophet_reg()
##
## -- Preprocessor ----------------------------------------------------------------
## 4 Recipe Steps
##
## * step_timeseries_signature()
## * step_rm()
## * step_normalize()
## * step_dummy()
##
## -- Model -----------------------------------------------------------------------
## PROPHET w/ Regressors Model
## - growth: 'linear'
## - n.changepoints: 25
## - changepoint.range: 0.8
## - yearly.seasonality: 'auto'
## - weekly.seasonality: 'auto'
## - daily.seasonality: 'auto'
## - seasonality.mode: 'additive'
## - changepoint.prior.scale: 0.05
## - seasonality.prior.scale: 10
## - holidays.prior.scale: 10
## - logistic_cap: NULL
## - logistic_floor: NULL
## - extra_regressors: 30
Using the new features, we can start creating and training machine learning models, starting with linear regression model. Setting the mixture variable will allow us to configure the model to be Ridge, Lasso or ElasticNet. Here, we’ll use ElasticNet for our experiment.
model_spec_glmnet <- linear_reg(penalty = 0.01, mixture = 0.5) %>%
set_engine("glmnet")
workflow_fit_glmnet <- workflow() %>%
add_model(model_spec_glmnet) %>%
add_recipe(recipe_spec_parsnip) %>%
fit(training(splits))
A random forest model can be setup similarly to the linear regression model.
model_spec_rf <- rand_forest(trees = 500, min_n = 50) %>%
set_engine("randomForest")
workflow_fit_rf <- workflow() %>%
add_model(model_spec_rf) %>%
add_recipe(recipe_spec_parsnip) %>%
fit(training(splits))
Model for XGBoost
workflow_fit_xgboost <- workflow() %>%
add_model(
boost_tree() %>% set_engine("xgboost")
) %>%
add_recipe(recipe_spec_parsnip) %>%
fit(training(splits))
workflow_fit_xgboost
## == Workflow [trained] ==========================================================
## Preprocessor: Recipe
## Model: boost_tree()
##
## -- Preprocessor ----------------------------------------------------------------
## 4 Recipe Steps
##
## * step_timeseries_signature()
## * step_rm()
## * step_normalize()
## * step_dummy()
##
## -- Model -----------------------------------------------------------------------
## ##### xgb.Booster
## raw: 35.9 Kb
## call:
## xgboost::xgb.train(params = list(eta = 0.3, max_depth = 6, gamma = 0,
## colsample_bytree = 1, min_child_weight = 1, subsample = 1,
## objective = "reg:squarederror"), data = x$data, nrounds = 15,
## watchlist = x$watchlist, verbose = 0, nthread = 1)
## params (as set within xgb.train):
## eta = "0.3", max_depth = "6", gamma = "0", colsample_bytree = "1", min_child_weight = "1", subsample = "1", objective = "reg:squarederror", nthread = "1", validate_parameters = "TRUE"
## xgb.attributes:
## niter
## callbacks:
## cb.evaluation.log()
## # of features: 32
## niter: 15
## nfeatures : 32
## evaluation_log:
## iter training_rmse
## 1 2.260684
## 2 1.881258
## ---
## 14 0.448955
## 15 0.409669
The last machine learning model to be tested is SVM
model_spec_svm <- svm_rbf() %>%
set_engine("kernlab") %>%
set_mode("regression") %>%
translate()
workflow_fit_svm <- workflow() %>%
add_model(
svm_rbf() %>%
set_engine("kernlab") %>%
set_mode("regression")
) %>%
add_recipe(recipe_spec_parsnip) %>%
fit(training(splits))
## Warning in .local(x, ...): Variable(s) `' constant. Cannot scale data.
Boosted ARIMA model
workflow_fit_arima_boosted <- workflow() %>%
add_model(
arima_boost(min_n = 2, learn_rate = 0.015) %>%
set_engine(engine = "auto_arima_xgboost")
) %>%
add_recipe(recipe_spec) %>%
fit(training(splits))
A hybrid model of Prophet and XGBoost is also supported by modeltime
model_spec_prophet_boost <- prophet_boost(seasonality_weekly = FALSE,
seasonality_daily = FALSE,
seasonality_yearly = FALSE) %>%
set_engine("prophet_xgboost")
workflow_fit_prophet_boost <- workflow() %>%
add_model(model_spec_prophet_boost) %>%
add_recipe(recipe_spec) %>%
fit(training(splits))
workflow_fit_prophet_boost
## == Workflow [trained] ==========================================================
## Preprocessor: Recipe
## Model: prophet_boost()
##
## -- Preprocessor ----------------------------------------------------------------
## 4 Recipe Steps
##
## * step_timeseries_signature()
## * step_rm()
## * step_normalize()
## * step_dummy()
##
## -- Model -----------------------------------------------------------------------
## PROPHET w/ XGBoost Errors
## ---
## Model 1: PROPHET
## - growth: 'linear'
## - n.changepoints: 25
## - changepoint.range: 0.8
## - yearly.seasonality: 'FALSE'
## - weekly.seasonality: 'FALSE'
## - daily.seasonality: 'FALSE'
## - seasonality.mode: 'additive'
## - changepoint.prior.scale: 0.05
## - seasonality.prior.scale: 10
## - holidays.prior.scale: 10
## - logistic_cap: NULL
## - logistic_floor: NULL
##
## ---
## Model 2: XGBoost Errors
##
## xgboost::xgb.train(params = list(eta = 0.3, max_depth = 6, gamma = 0,
## colsample_bytree = 1, min_child_weight = 1, subsample = 1,
## objective = "reg:squarederror"), data = x$data, nrounds = 15,
## watchlist = x$watchlist, verbose = 0, nthread = 1)
Once all the models have been fitted, they will be added into a Model Table using modeltime_table for an easy way to visualize and evaluate the performance of the models as well as do forecasting on all models at once.
model_table <- modeltime_table(
model_fit_arima,
workflow_fit_arima_boosted,
workflow_fit_prophet,
workflow_fit_prophet_boost,
#model_fit_ets,
workflow_fit_glmnet,
workflow_fit_rf,
workflow_fit_svm,
workflow_fit_xgboost
)
model_table
## # Modeltime Table
## # A tibble: 8 x 3
## .model_id .model .model_desc
## <int> <list> <chr>
## 1 1 <fit[+]> ARIMA(0,0,0)(1,0,0)[5] WITH ZERO MEAN
## 2 2 <workflow> ARIMA(0,0,0)(1,0,0)[5] WITH ZERO MEAN W/ XGBOOST ERRORS
## 3 3 <workflow> PROPHET W/ REGRESSORS
## 4 4 <workflow> PROPHET W/ XGBOOST ERRORS
## 5 5 <workflow> GLMNET
## 6 6 <workflow> RANDOMFOREST
## 7 7 <workflow> KERNLAB
## 8 8 <workflow> XGBOOST
calibration_table <- model_table %>%
modeltime_calibrate(testing(splits))
calibration_table
## # Modeltime Table
## # A tibble: 8 x 5
## .model_id .model .model_desc .type .calibration_da~
## <int> <list> <chr> <chr> <list>
## 1 1 <fit[+]> ARIMA(0,0,0)(1,0,0)[5] WITH ZERO M~ Test <tibble [14 x 4~
## 2 2 <workflo~ ARIMA(0,0,0)(1,0,0)[5] WITH ZERO M~ Test <tibble [14 x 4~
## 3 3 <workflo~ PROPHET W/ REGRESSORS Test <tibble [14 x 4~
## 4 4 <workflo~ PROPHET W/ XGBOOST ERRORS Test <tibble [14 x 4~
## 5 5 <workflo~ GLMNET Test <tibble [14 x 4~
## 6 6 <workflo~ RANDOMFOREST Test <tibble [14 x 4~
## 7 7 <workflo~ KERNLAB Test <tibble [14 x 4~
## 8 8 <workflo~ XGBOOST Test <tibble [14 x 4~
calibration_table %>%
modeltime_forecast(actual_data = stock_tbl) %>%
plot_modeltime_forecast(.interactive = TRUE)
calibration_table %>%
modeltime_accuracy() %>%
table_modeltime_accuracy(.interactive = FALSE)
| Accuracy Table | ||||||||
|---|---|---|---|---|---|---|---|---|
| .model_id | .model_desc | .type | mae | mape | mase | smape | rmse | rsq |
| 1 | ARIMA(0,0,0)(1,0,0)[5] WITH ZERO MEAN | Test | 1.78 | 118.43 | 1.00 | 162.96 | 2.15 | 0.09 |
| 2 | ARIMA(0,0,0)(1,0,0)[5] WITH ZERO MEAN W/ XGBOOST ERRORS | Test | 1.63 | 107.67 | 0.91 | 137.55 | 1.98 | 0.04 |
| 3 | PROPHET W/ REGRESSORS | Test | 2.03 | 171.10 | 1.14 | 167.00 | 2.42 | 0.06 |
| 4 | PROPHET W/ XGBOOST ERRORS | Test | 3.26 | 412.22 | 1.83 | 179.65 | 3.65 | 0.00 |
| 5 | GLMNET | Test | 1.71 | 147.62 | 0.96 | 134.49 | 2.11 | 0.15 |
| 6 | RANDOMFOREST | Test | 1.87 | 182.14 | 1.05 | 159.34 | 2.26 | 0.05 |
| 7 | KERNLAB | Test | 1.82 | 142.74 | 1.02 | 175.55 | 2.12 | 0.22 |
| 8 | XGBOOST | Test | 3.22 | 416.91 | 1.81 | 171.43 | 3.59 | 0.00 |
Refit and Forecast Forward
calibration_table %>%
modeltime_refit(stock_tbl) %>%
modeltime_forecast(h = "2 weeks", actual_data = stock_tbl) %>%
plot_modeltime_forecast(.interactive = TRUE)